WQ.2005/004159 • PCT/JP2004/009782 



1. A video processing apparatus for specifying frames to be 
start frames of a plurality of viewing segments when segmenting 
a content, comprising: 

a specifying information memory storing pieces of 
specifying information each showing a feature of frames to 
be specified as start frames and each corresponding to a 
different type of content; 

a content obtaining unit operable to obtain a content; 

an information obtaining unit operable to obtain type 
information showing the type of the obtained content; 

an extracting unit operable to extract from the 
specifying information memory a piece of specifying 
information corresponding to the type shown by the obtained 
type information; and 

a specifying unit operable to specify start frames 
present in the content, in accordance with the extracted piece 
of specifying information. 

2. The video processing apparatus of Claim 1, wherein 

each piece of specifying information further shows a 
feature of frames to be specified as presentation frames, 
each of which is to be displayed as a representative still 
image of a respective viewing segment, and 

the specifying unit further specifies presentation 
frames present in the content, in accordance with the extracted 
piece of specifying information. 

3. The video processing apparatus of Claim 2, further 
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comprising : 

an index storage unit operable to store, in 
correspondence with the content, display times of each start 
frame and presentation frame specif ied by the specif ying unit . 

4. The video processing apparatus of Claim 2, wherein 

the features shown by the specifying information are 
detectable through at least one of video analysis, still image 
analysis, and audio analysis, and 

the specifying unit specifies the start frames and 
presentation frames through at least one of video analysis, 
still image analysis, and audio analysis. 

5. The video processing apparatus of Claim 4, wherein 

the specifying information includes: 

a first condition showing a feature of frames to 
be detected as candidates for presentation frames; 

an exclusion condition showing a feature of frames 
to be excluded from candidates for presentation frames; 

a second condition showing a feature of frames to 
be detected as candidates for start frames; and 

a selection condition showing a relation between 
a presentation frame and a frame that is to be selected as 
a start frame, and 

the specifying unit specifies the presentation frames 
by detecting frames satisfying the first condition from all 
frames present in the content and subsequently excluding 
frames satisfying the exclusion condition from the detected 
frames, and specifies the start frames by detecting frames 
satisfying the second condition from all the frames present 
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in the content and subsequently selecting, from the detected 
frames, frames satisfying the relation shown by the selection 
condition with respect to the specified presentation frames . 

6. The video processing apparatus of Claim 5, wherein 

the specifying unit includes: 

a plurality of detecting subunits each operable to 
detect frames having a different feature; 

an excluding subunit operable to exclude frames 
satisfying the exclusion condition from frames satisfying 
the first condition; and 

a selecting subunit operable to select frames 
satisfying the relation shown by the selection condition from 
frames satisfying the second condition/ and 

the first condition, the exclusion condition, and the 
second condition each are an identifier of one of the detecting 
subunits to be used. 

7. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) detects from all the frames present 
in the content, large-caption start frames each of which is 
a first frame of a series of frames during which a caption 
of a size larger than a threshold continuously appears in 
a predetermined region, small caption frames in each of which 
a caption of a size smaller than a threshold appears in a 
region other than the predetermined region, CM frames which 
constitute a commercial message, and transition frames each 
of which is a first frame of a series of frames of similar 



WO, 2005/004 159 ' 



PCT/JP2004/009782 



images, (ii) specifies as a presentation frame each frame 
remaining after removing the small-caption frames and the 
CM frames from the large-caption start frames, and (iii) 
specifies as a start frame, for each presentation frame, a 
closest preceding transition frame to the presentation frame . 

8. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a,piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) excludes frames which constitute a 
commercial message from all the frames present in the content, 
(ii) detects from the remaining frames, large-caption start 
frames each of which is a first frame of a series of frames 
during which a caption of a size larger than a threshold 
continuously appears in a predetermined region, small caption 
frames in each of which a caption of a size smaller than a 
threshold appears in a region other than the predetermined 
region, and transition frames each of which is a first frame 
of a series of frames of similar images, (iii) specifies as 
a presentation frame each frame remaining after removing the 
small-caption frames from the large-caption start frames, 
and (iv) specifies as a start frame, for each presentation 
frame, a closest preceding transition frame to the 
presentation frame. 

9. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) detects from all the frames present 
in the content, large-caption start frames each which is a 
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first frame of a series of frames during which a caption of 
a size larger than a threshold continuously appears in a 
predetermined region, small caption frames in each of which 
a caption of a size smaller than a threshold appears in a 
region other than the predetermined region, CM frames which 
constitute a commercial message, and silent frames of which 
audio data is below a predetermined volume level, (ii) 
specifies as a presentation frame each frame remaining after 
removing the small-caption frames and the CM frames from the 
large-caption start frames, and (iii) specifies as a start 
frame, for each presentation frame, a closest silent frame 
to the presentation frame. 

10. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) excludes frames which constitute a 
commercial message from all the frames present in the content, 
(ii) detects from the remaining frames, large-caption start 
frames each of which is a first frame of a series of frames 
during which a caption of a size larger than a threshold 
continuously appears in a predetermined region, small caption 
frames in each of which a caption of a size smaller than a 
threshold appears in a region other than the predetermined 
region, and silent frames of which audio data is below a 
predetermined volume level, (iii) specifies as a presentation 
frame each frame remaining after removing the small-caption 
frames from the large-caption start frames, and (iv) specifies 
as a start frame, for each presentation frame, a closest 
preceding silent frame to the presentation frame. 
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11. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of music 
program, the specifying unit (i) detects from all the frames 
present in the content, large-caption start frames each of 
which is a first frame of a series of frames during which 
a caption of a size larger than a threshold continuously appears 
in a predetermined region, small caption frames in each of 
which a caption of a size smaller than a threshold appears 
in a region other than the predetermined region, CM frames 
which constitute a commercial message, and music-start frames 
each of which is a first frame of a series of frames of which 
audio data represents a piece of music data, (ii) specifies 
as a presentation frame each frame remaining after removing 
the small-caption frames and CM frames from the large-caption 
start frames, and (iii) specifies as a start frame, for each 
presentation frame, a closest preceding music-start frame 
to the presentation frame. 

12. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of music 
program, the _ specif ying unit (i) excludes frames which 
constitute a commercial message from all the frames present 
in the content, (ii) detects from the remaining frames, 
large-caption start frames each of which is a first frame 
of a series of frames during which a caption of a size larger 
than a threshold continuously appears in a predetermined 
region, small caption frames in each of which a caption of 
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a size smaller than a threshold appears in a region other 
than the predetermined region, and music-start frames each 
of which is a first frame of a series of frames of which audio 
data represents a piece of music data, (iii) specifies as 
a presentation frame each frame remaining after removing the 
small-caption frames from the large-caption start frames, 
and (iv) specifies as a start frame, for each presentation 
frame, a closest preceding music-start frame to the 
presentation frame . 

13. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) detects from all the frames present 
in the content, large-caption start frames each of which is 
a first frame of a series of frames during which a caption 
of a size larger than a threshold continuously appears in 
a predetermined region, small caption frames in each of which 
a caption of a size smaller than a threshold appears in a 
region other than the predetermined region, CM frames which 
constitutes a commercial message, and speech-start frames 
each of which is a first frame of a series of frames of which 
audio data represents a speech of a specific speaker, (ii) 
specifies as a presentation frame each frame remaining after 
removing the small-caption frames and the CM frames from the 
large-caption start frames, and (iii) specifies as a start 
frame, for each presentation frame, a closest preceding 
speech-start frame to the presentation frame. 

14. The video processing apparatus of Claim 4, wherein 
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when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) excludes frames which constitute a 
commercial message from all the frames present in the content, 
(ii) detects from the remaining frames, large-caption start 
frames each of which is a first frame of a series of frames 
during which a caption of a size larger than a threshold 
continuously appears in a predetermined region, small caption 
frames in each of which a caption of a size smaller than a 
threshold appears in a region other than the predetermined 
region, and speech-start frames each of which is a first frame 
of a series of frames of which audio data represents a speech 
of a specific speaker, (iii) specif ies as a presentation frame 
each frame remaining after removing the small-caption frames 
from the large-caption start frames, and (iv) specifies as 
a start frame, for each presentation frame , a closest preceding 
speech-start frame to the presentation frame. 

15. The video processing apparatus of Claim 4, wherein 

when operating in accordance with a piece of specifying 
information corresponding to a predetermined type of content, 
the specifying unit (i) detects from all the frames present 
in the content, CM-start frames each of which is a first frame 
of a series of frames which constitute a commercial message, 
and transition frames each of which is a first frame of a 
series of frames of similar images, (ii) specifies each 
CM-start frame as a start frame, and (iii) specifies as a 
presentation frame, for each start frame, a closest subsequent 
transition frame to the start frame. 
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16. The video processing apparatus of Claim 2, further 
comprising: 

a playback unit operable to play back the content starting 
from a start frame specified by the specifying unit. 

17. The video processing apparatus of Claim 16, further 
comprising: 

an index storing unit operable to store pairs display 
times of each start frame and presentation frame specified 
for a respective viewing segment by the specifying unit; 

a display unit operable to display a presentation frame 
specified for each viewing segment by the specifying unit; 
and 

a user-selection unit operable to select at least one 
of the presentation frames displayed, in accordance with a 
user selection, wherein 

the playback unit plays back the content starting from 
a start frame of a viewing segment to which the user-selected 
presentation frame belongs. 

18. The video processing apparatus of Claim 17, wherein 

the display unit displays the presentation frames by 
generating a thumbnail image of each presentation frame and 
displaying the thumbnail images in list form. 

19. The video processing apparatus of Claim 17, wherein 

the user-selection unit stores the selectedpresentation 
frame as a reference image into the specifying information 
memory, and 

the specifying unit specifies the presentation frames 
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by detecting frames which are similar to the reference image 
with respect to a location of a region in which a caption 
appears . 

20. The video processing apparatus of Claim 1, further 
comprising : 

a recording unit operable to obtain a content and type 
information of the content, and to record the content to a 
recording medium in correspondence with the type information, 
wherein 

after the recording unit records the type information 
and at least part of the content, the content obtaining unit 
sequentially obtains the part of the content from the recording 
medium, and 

the specifying unit sequentially specifies start frame 
present in the part of the content obtained by the content 
obtaining unit. 

21. The video processing apparatus of Claim 1, further 
comprising: 

a recording unit operable to obtain a content and type 
information of the content, encode the content, and record 
the encoded content in correspondence with the type 
information, wherein 

after the recording unit records the type information 
and encodes at least part of the content, the content obtaining 
unit sequentially obtains the encoded part of the content, 
and • 

the specifying unit obtains analyses of the encoded part 
conducted by the recording unit for the encoding, and 
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sequentially specifies start frame present in the encoded 
part using the analyses. 

22. The video processing apparatus of Claim 1, further 
comprising : 

an updating unit operable to obtain a new version of 
specifying information corresponding to a specific type of 
content, and record the new version of specifying information 
to the specifying information memory. 

23. The video processing apparatus of Claim 22, wherein 

the updating unit obtains the new version of specifying 
information when connected via a communication network to 
a provider apparatus for providing specifying information, 
and judging that the new version of specifying information 
is available, and 

the new version of specifying information is recorded 
to the specifying information memory by updating a piece of 
specifying information stored therein corresponding to the 
specific type to the new version. 

24. The video processing apparatus of Claim 23, wherein 

the judgment as to whether the new version of specifying 
information is available is made each time the specifying 
unit processes the specific type of content. 

25. An integrated circuit for use in a video processing 
apparatus that specifies frames to be start frames of a 
plurality of viewing segments when segmenting a content, the 
video processing apparatus having a specifying information 
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memory storing pieces of specifying information each showing 
a feature of frames to be specified as start frames and each 
corresponding to a different type of content, the integrated 
circuit comprising: 

a content obtaining module operable to obtain a content; 

an information obtaining module operable to obtain type 
information showing the type of the obtained content; 

an extracting module operable to extract from the 
specifying information memory a piece of specifying 
information corresponding to the type shown by the obtained 
type information; and 

a specifying module operable to specify start frames 
present in the content, in accordance with the extracted piece 
of specifying information. 

26. A video processing method for use by a video processing 
apparatus that specifies frames to be start frames of a 
plurality of viewing segments when segmenting a content, the 
video processing apparatus having a specifying information 
memory storing pieces of specifying information each showing 
a feature of frames to be specified as start frames and each 
corresponding to a different type of content, the video 
processing method comprising the steps of: 
obtaining a content; 

obtaining a type information showing a type of the 
obtained content; 

extracting from the specifying inf ormationmemory a piece 
of specifying information corresponding to the type shown 
by the obtained type information; and 

specifying start frames present in the content, in 
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accordance with the extracted piece of specifying 
information. 

27 . A video processing program for causing a device to specif-, 
frames to be start frames of a plurality of viewing segments 
when segmenting a content, the device having a specifying 
information memory storing pieces of specifying information 
each showing a feature of frames to be specified as start 
frames and each corresponding to a different type of content, 
the video processing program comprising the steps of: 
obtaining a content; 

obtaining a type information showing a type of the 
obtained content; 

extracting from the specifying inf ormat ion memory a piece 
of specifying information corresponding to the type shown 
by the obtained type information; and 

specifying start frames present in the content, in 
accordance with the extracted piece of specifying 
information. 



79 



